Restaurant Review & Lingusitic Features Analytics

Load Data

Preprocessing

Exploratory Data Analysis (EDA)

Distribution of data

Distribution across different location

Highest on review on Kuala Lumpur, lowest on Miri.

Rating Distribution

Rating sckewed towards the high (Positive sentiment) side, as the average rating is 4.18.

Langkawi on average has the highest average review rating, whereas the lowest is Melaka.

Sentiment Distribution

More than 80% of the reviews are positive reviews, whereas the neutral and negative rewviews have around 10% each

Top Restaurants

Overall Top restaurants based on average rating

May not be reflective, as some restaurant with high rating may have very less review

Top restaurants : Combined scoring for reviews count and rating, using sum of rating

Relationship between review length and rating

No direct correlationship can be observed. Only slightly negatively correlated, in which longer review tends to be lower rating

Timeseries Analysis

Overall no. of reviews per month

Overall no. of reviews per month breakdown by locations

It could be observe that the number of reviews dropped significanlty on the beginning of year 2020, which could corresponding to the start of COVID-19 pandemic in Malaysia.

The up and downs after the COVID-19 may corresponds to the MCO executed by the Malaysia government.

Linguistic Features Analytics

Unigram & WordCloud

Overall WordCloud

Unigram

Positive Review

Positive review word cloud

Neutral Reviews

Neutral review word cloud

Negative Reviews

Negative review word cloud

Combination of unigrams for 3 sentiments

It could be observed that, the top common attributes are food, place, service

Bigram

Bigram for Positive Review

There are many positive bigrams can be observed, usch as "good food", "food good", "good service", "must try", "nice place" etc.

Bigram for Neutral Reviews

The top bigram (food good) for both positive and neutral are the same.

This shows that the neutral sentiment class is rather ambiguous, in which some reviews with "food good" are positive reviews, but some are neutral reviews, at the same time there are more negative bigrams, such as "not great", "not good", but also within the neutral sentiment.

However, there are some more neutral bigrams notices, such as "not bad", "food ok" etc.

Bigram for Negative Reviews

Many negative bigrams can be observed, such as "not worth", "bad service", "not good", "not recommended" and many other negate lexicons, such as "food not", "not even", and "would not"

Trigram

Positive Trigrams

If based on trigram, the service is top most common lexicons, followed by food, and then price.

Neutral Trigrams

Negative Trigrams

I can be observed that most negative review will mention not worth the price, not recommended, followed by food.

POS Tagging

Top Unigram POS Tags

Nouns

Adjective

Verbs

Adverbs

Combined Table

The top common lexicons for nouns, verbs, adjectives, and adverb, are shown on top

Positive POS

Neutral POS

Negative POS

Lexicon Polarity